home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1993
/
Internet Info CD-ROM (Walnut Creek) (1993).iso
/
inet
/
internet-drafts
/
draft-ietf-avt-profile-03.txt
< prev
next >
Wrap
Text File
|
1993-10-21
|
23KB
|
540 lines
Internet Engineering Task Force Audio-Video Transport Working Group
INTERNET-DRAFT H. Schulzrinne
draft-ietf-avt-profile-03.txt AT&T Bell Laboratories
October 20, 1993
Expires: 12/31/93
Sample Profile and Encodings for the Use of RTP for Audio and Video
Conferences with Minimal Control
Status of this Memo
This document is an Internet Draft. Internet Drafts are working documents
of the Internet Engineering Task Force (IETF), its Areas, and its Working
Groups. Note that other groups may also distribute working documents as
Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six months.
Internet Drafts may be updated, replaced, or obsoleted by other documents
at any time. It is not appropriate to use Internet Drafts as reference
material or to cite them other than as a ``working draft'' or ``work in
progress.''
Please check the I-D abstract listing contained in each Internet Draft
directory to learn the current status of this or any other Internet Draft.
Distribution of this document is unlimited.
Abstract
This note describes a profile for the use of the real-time
transport protocol (RTP) and the associated control protocol, RTCP,
within audio and video multiparticipant conferences with minimal
control. It provides interpretations of generic fields within the
RTP specification suitable for audio and video conferences. In
particular, this document defines a set of default mappings from
format index to encodings.
The document also describes how audio and video data may be
carried within RTP. It defines a set of standard encodings and
their names when used within RTP. However, the definitions are
independent of the particular transport mechanism used. The
descriptions provide pointers to reference implementations and
the detailed standards. This document is meant as an aid
for implementors of audio, video and other real-time multimedia
applications.
INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
Contents
1 Introduction 2
2 Demultiplexing 3
3 Audio 3
3.1 Encoding-independent recommendations . . . . . . . . . . . . . . . 3
3.2 Recommended Audio Encodings. . . . . . . . . . . . . . . . . . . . 4
3.3 The RTCP FMT Option for Audio. . . . . . . . . . . . . . . . . . . 6
3.4 Port Assignment. . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Video 8
4.1 The RTCP FMT Option for Video. . . . . . . . . . . . . . . . . . . 9
4.2 Port Assignment. . . . . . . . . . . . . . . . . . . . . . . . . . 9
5 Miscellaneous 10
6 Address of Author 10
1 Introduction
This profile defines aspects of RTP left unspecified in the RTP protocol
definition (RFC TBD). This profile is intended for the use within audio and
video conferences with minimal session control. In particular, no support
for the negotiation of parameters or membership control is provided. Other
profiles may make different choices for the items specified here. The
profile specifies the use of RTP over unicast and multicast UDP as well
as ST-II. For unicast UDP and ST-II, references to multicast addresses
are to be ignored. The use of this profile is indicated by the use of
a media-specific well-known port number. The profile may also be used
with other port numbers. For example, the use of a particular session
announcement tool could imply use of this profile.
H. Schulzrinne Expires 12/31/93 [Page 2]
internet-dRAFT draft-ietf-avt-profile-03.txt October 20, 1993
2 Demultiplexing
For applications which choose to share a single network destination address
and port for both audio and video, the default channel identifier for audio
is 0 and for video is 1. In that case, the port number for audio is used.
This combination should only be used when it is known that all receiving
applications can properly demultiplex audio and video.
3 Audio
3.1 Encoding-independent recommendations
The following recommendations are default operating parameters. Ap-
plications should be prepared to handle other values. The ranges
given are meant to give guidance to application writers, allowing a set
of applications conforming to these guidelines to interoperate without
additional negotiation. These guidelines are not intended to restrict
operating parameters for applications that can negotiate a set of
interoperable parameters, e.g., through a conference control protocol.
For packetized audio, the default packetization interval should have a
duration of 20 ms, unless otherwise noted in Table 1. The packetization
interval determines the minimum end-to-end delay; longer packets introduce
less header overhead but higher delay and make packet loss more noticeable.
For non-interactive applications such as lectures or links with severe
bandwidth constraints, a higher packetization delay may be appropriate. For
frame-based encodings (marked as F in the table 1 below) such as LPC, CELP
and GSM, the sender may choose to combine several frame intervals into a
single message. The receiver can tell the number of frames contained in a
message since the frame duration is defined as part of the encoding.
If multiple channels are used, the left channel information always precedes
the right-channel information. For more than two channels, the convention
followed by the AIFF-C audio interchange format should be followed. (The
AIFF-C specification is available by anonymous ftp at ftp.sgi.com in the
file sgi/aiff-c.9.26.91.ps.) For two-channel stereo, the sequence is left,
right; for three channels, left, right, center; for quadrophonic systems,
front left, front right, rear left, rear right; for four-channel systems,
left, center, right, and surround sound; for six-channel systems left, left
center, center, right, right center and surround sound.
The sampling frequency should be drawn from the set: 8, 11.025, 16, 22.05,
44.1 and 48 kHz.
H. Schulzrinne Expires 12/31/93 [Page 3]
INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
3.2 Recommended Audio Encodings
The table 1 shows the names, types (sample vs. frame oriented), per-channel
bit rates and default sampling frequencies of recommended encodings. The
list is partially drawn from the document "Recommended practices for
enhancing digital audio compatibility in multimedia systems", published by
the Interactive Multimedia Assocation, Version 3.00, Oct. 1992 (referenced
as [IMA]). The names are for identification only; they correspond to the
names used within the Real-Time Transport Protocol (RTP). Other applications
may choose different namings. Note that the L16 encoding may be used with
different sampling rates. The CCITT changed its name in 1993 to ITU-T; to
limit confusion, both old and new name are used.
name nom. sampling rate type frame description
kHz kb/s S/F ms
_________________________________________________________________________
L16 48 768 S 16-bit linear, 2's complement
L16 44.1 705.6 S
L16 22.05 352.8 S
L16 11.025 176.4 S
G722 16 64 S CCITT/ITU-T subband ADPCM
PCMU 8 64 S CCITT/ITU-T mu-law PCM
PCMA 8 64 S CCITT/ITU-T A-law PCM
G721 8 32 S CCITT/ITU-T ADPCM
IDVI 8 32 S Intel/DVI ADPCM [IMA]
G723 8 24 S CCITT/ITU-T ADPCM
GSM 8 13 F 20 RTE/LTP GSM 06.10
1016 8 4.8 F 30 CELP
_________________________________________________________________________
Table 1: Audio encodings
For multi-octet encodings, octets are transmitted in network byte order
(i.e., most significant octet first).
A detailed description of the encodings is given below. The names shown
(L16, PCMU, etc.) are limited to four characters and suitable to be used
for identification in protocols such as RTP (RFC TBD).
L16: denotes uncompressed audio data, using 16-bit signed representation
with 65535 equally divided steps between minimum and maximum signal
level, ranging from -32768 to 32767. The value is represented in two's
complement notation.
PCMU: specified in CCITT/ITU-T recommendation G.711. Audio data is encoded
as eight bits per sample, after companding. Code to convert between
linear and mu-law companded data is available in the IMA document.
PCMA: specified in CCITT/ITU-T recommendation G.711. Audio data is encoded
H. Schulzrinne Expires 12/31/93 [Page 4]
INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
as eight bits per sample, after companding. Code to convert between
linear and A-law companded data is available in the IMA document.
G721 through G729: specified in the corresponding CCITT/ITU-T recommenda-
tions. Reference implementations for G.721 and G.723 are available
as part of the CCITT/ITU-T Software Tool Library (STL) from the
ITU General Secretariat, Sales Service, Place du Nations, CH-1211
Geneve 20, Switzerland. The library is covered by a license
and is available for anonymous ftp on gaia.cs.umass.edu, file
pub/ccitt/ccitt_tools.tar.Z.
GSM: (group speciale mobile) denotes the European GSM 06.10 provisional
standard for full-rate speech transcoding, prI-ETS 300 036, which
is based on RPE/LTP (residual pulse excitation/long term prediction)
coding at a rate of 13 kb/s. A reference implementation was written by
Carsten Borman and Jutta Degener (TU Berlin, Germany) and is available
for anonymous ftp from tub.cs.tu-berlin.de, directory tub/tubmik.
1016: uses code-excited linear prediction (CELP) and is specified in
Federal Standard FED-STD 1016, published by the Office of Technology
and Standards, Washington, DC 20305-2010.
The U. S. DoD's Federal-Standard-1016 based 4800 bps code excited
linear prediction voice coder version 3.2 (CELP 3.2) Fortran and
C simulation source codes are available for worldwide distribution
at no charge (on DOS diskettes, but configured to compile on Sun
SPARC stations) from: Bob Fenichel, National Communications System,
Washington, D.C. 20305, phone +1-703-692-2124, fax +1-703-746-4960.
Example input and processed speech files, a technical information
bulletin, and the official standard "Federal Standard 1016, Telecom-
munications: Analog to Digital Conversion of Radio Voice by 4,800
bit/second Code Excited Linear Prediction (CELP)" are included at no
charge. According to Vincent Cate (Carnegie Mellon), the distribution
is also available for anonymous ftp at furmint.nectar.cs.cmu.edu
(128.2.209.111) in directory celp.audio.compression.
The following articles describes the Federal-Standard-1016 4.8-kbps
CELP coder:
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The
Proposed Federal Standard 1016 4800 bps Voice Coder: CELP," Speech
Technology Magazine, April/May 1990, p. 58-64.
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The
Federal Standard 1016 4800 bps CELP Voice Coder," Digital Signal
Processing, Academic Press, 1991, Vol. 1, No. 3, p. 145-155.
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The
DoD 4.8 kbps Standard (Proposed Federal Standard 1016)," in Advances
in Speech Coding, ed. Atal, Cuperman and Gersho, Kluwer Academic
H. Schulzrinne Expires 12/31/93 [Page 5]
INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
Publishers, 1991, Chapter 12, p. 121-133.
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The
Proposed Federal Standard 1016 4800 bps Voice Coder: CELP," Speech
Technology Magazine, April/May 1990, p. 58-64.
Copies of the FS-1016 document are available for $2.50 each from:
GSA Rm 6654
7th & D St SW
Washington, D.C. 20407
1-202-708-9205
DVI: is specified in the "Recommended Practices for Enhancing Digital Audio
Compatibility in Multimedia Systems", published by the Interactive
Multimedia Association (IMA), Annapolis, MD. The document also contains
reference implementations for mu-law to 16-bit, ADPCM and sample rate
conversions.
For sample-based encodings, a receiver should accept packets representing
between 0 and 200 ms of audio data.(1) Receivers should be prepared to
accept multi-channel audio, but may choose to only play a single channel.
All block-oriented audio codecs should be able to encode and decode several
consecutive blocks within a single packet. Since the frame size for
the block-oriented codecs is given, there is no need to use a separate
designation for the same encoding, but with different number of blocks per
packet.
3.3 The RTCP FMT Option for Audio
Unless specified with the FMT option, the mapping between the format field
in an RTP packet and audio encodings, sampling rates and channel counts is
specified by Tables 2.
Format values of 31 and below cannot be redefined by FMT options. In other
words, only values of 32 and above are valid in the format field within an
FMT option. The receiver is expected to discard RTP packets containing
media data with unknown format field values. Sites are expected to keep
the mapping between format and encoding constant, so that lost packets
containing FMT options do not lead the receiver to misinterpret media data.
Additional standard encodings may be registered with the Internet Assigned
------------------------------
1. This restriction allows reasonable buffer sizing for the receiver.
H. Schulzrinne Expires 12/31/93 [Page 6]
INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
Numbers Authority (IANA). The format name is intended to describe the format
in an unambiguous way; it is interpreted as a sequence of four ASCII
characters, with uppercase and lowercase characters treated as distinct.
Format names beginning with the letter 'X' are reserved for experimental use
and not subject to registration. These experimental encodings may be mapped
to format values 32 and above using the FMT option. Additional standard
mappings to format values of 31 and below may also be registered with IANA.
Registered assignments are published periodically in the Assigned Numbers
RFC.
Within the FMT option, the format name is followed by a field containing a
channel count and a sample rate field, measured in samples per second.(2) A
channel count of zero is considered invalid. A packetization interval of 20
ms or a multiple thereof is suggested as it leads to integral sample counts
for all common sampling rates.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| FMT | length |0|0| format | reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| name of format |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| channels | sampling rate (Hz) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
... encoding specific parameters ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: FMT option for audio encodings
3.4 Port Assignment
ST-II SAP and UDP port 5005 is the default destination for multicast
real-time audio data carried by RTP for this profile.
A fixed port number is useful as it is less likely than a randomly chosen
port number to be already in use by another application at one or more of
the intended destination hosts. Also, fixed port numbers allow traffic
statistics to be collected and may simplify firewall implementations. A
single fixed port number requires that hosts allow several processes to use
a single UDP port with different multicast addresses. (The particular port
number was chosen to lie in the range above 5000 to accomodate port number
------------------------------
2. Fractional samples per second was considered excessive as the typical
crystal accuraccy of 100 ppm translates into about one Hz or more of
sampling rate inaccuracy.
H. Schulzrinne Expires 12/31/93 [Page 7]
INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
index encoding sampling rate channels
name (kHz)
__________________________________________
0 PCMU 8 1
1 1016 8 1
2 G721 8 1
3 GSM 8 1
4 G723 8 1
5 IDVI 8 1
10 L16 44.1 2
__________________________________________
Table 2: Standard audio encodings
allocation practice within the Unix operating system, where port numbers
below 1024 can only be used by privileged processes and port numbers between
1024 and 5000 are automatically assigned by the operating system.)
Unicast connections may use the this or a set of mutually agreed-upon port
numbers.
4 Video
The following video encodings are currently defined, with their abbreviated
names used for identification:
CPV: This encoding, "Compressed Packet Video" is implemented by Concept,
Bolter, and ViewPoint Systems video codecs.
JPEG: The encoding is specified in ISO Standards DIS 10918-1 and DIS
10918-2. The data is formatted according to the JFIF (JPEG File
Interchange Format) defined by C-Cube Microsystems.
H261: The encoding is specified in CCITT/ITU-T standard H.261. The
packetization and RTP-specific properties are described in RFC TBD.
nv: The encoding is implemented in the program 'nv' developed at Xerox PARC
by Ron Frederick.
CUSM: The encoding is implemented in the program CU-SeeMe developed at
Cornell University by Dick Cogger, Scott Brim, Tim Dorcey and John
Lynn.
PicW: The encoding is implemented in the program PictureWindow developed at
Bolt, Beranek and Newman (BBN).
H. Schulzrinne Expires 12/31/93 [Page 8]
INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
4.1 The RTCP FMT Option for Video
Unless specified with the RTCP FMT option, the mapping between the format
field in an RTP packet and the video encoding is specified by Tables 3. The
second paragraph of Section 3.3 applies for video as well.
Within the video FMT option, a one-octet numeric version identifier further
describes the encoding. Unless otherwise defined, the version identifier
has the value zero.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| FMT | length |0|0| format | reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| name of format |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version | encoding-specific parameters |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
... encoding-specific parameters ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: FMT option for video encodings
number name
______________
26 JPEG
27 CUSM
28 nv
29 PicW
30 Bolt
31 H261
Table 3: Format values for standard video encodings
4.2 Port Assignment
ST-II SAP and UDP port 5006 is the default destination for multicast
real-time video data carried by RTP for this profile. The remainder of
section 3.4 applies.
H. Schulzrinne Expires 12/31/93 [Page 9]
INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
5 Miscellaneous
RTCP messages should be sent periodically, with a period varying randomly
around a set mean to avoid synchronized bursts of RTCP packets. (For
example, the time between messages could vary uniformly between one half and
1.5 times the mean.) The average period between transmissions determines
the additional network load due to RTCP packets and also determines how
long it will take a new arrival to discover the identities of the other
conference participants. The average period should be chosen such that no
more than a small fraction (say, 1%) of the media bandwidth is consumed by
RTCP messages from all sources, with a minimum period of a few seconds.
By scaling the message frequency with the (slowly increasing) number of
observed participants, a new conference participant will quickly inform all
other participants of its arrival and then slow its announcement rate.
6 Address of Author
Henning Schulzrinne
AT&T Bell Laboratories
MH 2A244
600 Mountain Avenue
Murray Hill, NJ 07974-0636
telephone: +1 908 582 2262
facsimile: +1 908 582 5809
electronic mail: hgs@research.att.com
H. Schulzrinne Expires 12/31/93 [Page 10]